Device and method for converting two-dimensional video to three-dimensional video

ABSTRACT

There is provided parallax correction means for correcting a parallax for each area calculated by parallax calculation means in accordance with the magnitude of a motion vector for the area detected by motion vector detection means in order to prevent the three-dimensional effect of a conversion video from greatly differing depending on an input video when the MTD method and the CID method are simultaneously used. 
     When a depth estimate is converted into a parallax, the depth estimate is subjected to distance scale conversion in order to suppress the distortion of the conversion image, to find a tentative target phase for each parallax calculation area, and a dynamic range in which a phase difference between the parallax calculation areas is within a distortion allowable range is searched for and is subjected to distance scale conversion, to find a tentative target phase, which operations are repeated.

TECHNICAL FIELD

The present invention relates to a device for and a method of convertinga two-dimensional video into a three-dimensional video.

BACKGROUND ART

As a method of converting a two-dimensional video into athree-dimensional video, methods disclosed in JP-A-9-107562 andJP-A-10-51812 have been known.

The outline of the method of converting a two-dimensional video into athree-dimensional video, which is disclosed in JP-A-9-107562, will befirst described on the basis of FIG. 1.

In a two-dimensional video (2D video), a state where a bird is flyingfrom the left to the right in front of a mountain shall be picked up, asshown in images 1 to 5.

A motion vector between images, for example, a motion vector in the caseof transition from the image 1 to the image 2 or a motion vector fortransition from the image 2 to the image 3, is extracted for each of aplurality of motion vector detection areas set in a screen. A subject(bird) area and a background (mountain) area are then determined fromthe extracted motion vector. A reference image is determined to be oneof a right eye image and a left eye image, and an image which is delayedby several fields corresponding to the magnitude of the motion vector isdetermined to be the other eye image such that a subject is locatedahead of a background.

When it is assumed that the current image which is the reference imageis the image 4, and an image (a delayed image) which is delayed by apredetermined number of fields depending on the magnitude of a motionvector obtained from the image 3 and the image 4 is the image 2, thereference image (the image 4) and the delayed image (the image 2) arerespectively presented as a left eye image and a right eye image in thedirection of the motion vector.

The operations are repeatedly performed, thereby displaying a videohaving a stereoscopic effect, that is, a three-dimensional video. Thismethod shall be referred to as the MTD method.

The concept of the method of converting a two-dimensional video into athree-dimensional video, which is disclosed in JP-A-10-51812, will bedescribed.

First, a two-dimensional image is divided into a plurality of areas, andimage features such as a chrominance component, a high-frequencycomponent, and a contrast are extracted for each of the areas obtainedby the division. The areas obtained by the division are then grouped byareas to which the same object belongs on the basis of the chrominancecomponent. A depth is estimated for the areas obtained by the groupingdepending on information related to the average contrast and the averagehigh-frequency component in the areas, to calculate a parallax. A lefteye image and a right eye image are horizontally shifted in the oppositedirections for the areas obtained by the grouping on the basis of thecalculated parallax, to produce a three-dimensional video.

The left eye video and the right eye video, which are thus produced, arestereoscopically displayed on stereoscopic display means. This methodshall be referred to as the CID method.

The MTD method and the CID method will be described in more detail.

1. MTD Method

In the MTD method, a video entering either one of the right and lefteyes is delayed depending on the movement thereof in a screen, toproduce a stereoscopic effect. In this case, a field delay (to be atarget) (a target delay dly_target) most suitable for the video isdetermined by the following equation (1) using an average of horizontalvectors in a subject area obj_xvec [pixel/field] and a horizontal vectorin a background area bg_xvec [pixel/field] which are obtained bysubject/background judgment. The vector takes a positive value withrespect to rightward movement.dly_target=Mdly_sisa/(obj_xvec−bg_xvec) [field]  (1)

Here, Mdly_sisa indicates a parallax [pixel] for determining astereoscopic effect produced by the MTD method, and its value ispreviously set through a user interface or the like.

The direction of delay showing which of the videos entering the rightand left eyes should be delayed is determined by the following equation(2) using the target delay dly_target:dly_target>0 . . . delay of right eyedly_target<0 . . . delay of left eyedly_target=0 . . . no delay  (2)

Although the delay was described, taking the target delay as an examplefor convenience, the number of fields by which the video is delayed andthe direction of delay are determined by a real delay obtained bysmoothing the target delay on a time basis.

2. Subject Position Control

Subject position control is employed in order to correct ambiguity,concerning the position where an object is presented relative to ascreen, created when the MTD method is carried out. That is, in the MTDmethod, how a video is seen differs depending on which of a subject anda background moves, as shown in FIG. 2. In the subject position control,when the subject moves, the overall screen is moved backward by shiftingthe position where a right eye video is presented to the right andshifting the position where a left eye video is presented to the left sothat the number of pixels from the subject to the screen is equal to thenumber of pixels from the screen to the background. On the other hand,when the background moves, the overall screen is moved forward byshifting the position where a right eye video is presented to the leftand shifting the position where a left eye video is presented to theright so that the number of pixels from the subject to the screen isequal to the number of pixels from the screen to the background.

A horizontal phase t_phr of the right eye and a horizontal phase t_phlof the left eye, which are calculated by the subject position control,can be expressed by the following equation (4) when a phase obj_sisa ofthe subject and a phase bg_sisa of the background, which are produced bya field delay, are expressed by the following equation (3):obj_sisa=obj_xvec*delay [pixel]bg_sisa=bg_xvec*delay [pixel]  (3)t_phr=(obj_sisa+bg_sisa)/2 [pixel]t_phl=−t_phr [pixel]  (4)

Since the real delay is obtained by smoothing the target delaydly_target on a time basis, the absolute value of a parallax dly_sisa(=obj_sisa−bg_sisa) [pixel] produced by the MTD method (dly_sisa takes apositive value when the subject is projected, while taking a negativevalue when it is recessed) does not completely coincide with Mdly_sisa[pixel] previously determined by user setting. When there is no delay(dly_target=0), dly_sisa=0.

3. CID Method

The CID method is a method of dividing one screen into a plurality ofareas, estimating a depth for each of the areas from image informationobtained from the area and a composition, and shifting each of pixels inthe screen on the basis of the estimated depth, to produce a binocularparallax.

The applicant of the present invention has also developed a CID methodwhich is a further improvement of the CID method already developed.

FIG. 3 shows the procedure for control in the CID method after theimprovement (which is not known).

First, one screen is divided into a plurality of areas, and informationrelated to a high frequency, a contrast of luminance, and a chrominance(B-Y, R-Y) component are obtained from each of the areas (step 1). Adepth estimate for each of the areas, which has been estimated from theinformation and the composition, is found (step 2). When the found depthestimate is merely converted into a shift amount, a distortion isnoticeable in a conversion image, thereby performing distortionsuppression processing (step 3). The depth estimate after the distortionsuppression processing is subjected to distance scale conversion (step4).

The distortion suppression processing will be described. In the CIDmethod, a 2D image is deformed, to produce left and right images. Whenthe deformation is too large, an unnatural video is obtained. In the CIDmethod, therefore, control is carried out such that the difference inphase between the adjacent areas is not more than a distortion allowablerange h_supp_lev [Pixel] of a conversion image which is previouslydetermined by a user. That is, the difference in phase between theadjacent areas is found from phases for the areas which are found byassigning the estimated depth to the distance between Mfront and Mrear.The maximum value of the difference is taken as h_dv_max [pixel]. Whenh_dv_max exceeds the distortion allowable range h_supp_lev [pixel],Mfront and Mrear are reduced in the direction nearer to 0 [pixel] untilthe following equation (5) is satisfied:h_dv_max≦h_supp_lev  (5)

When h_dv_max is larger than h_supp_lev, therefore, a projection phasefront [Pixel] and a recession phase rear [Pixel] of the conversion imageare made smaller than the maximum projection phase Mfront [Pixel] andthe maximum recession phase Mrear [Pixel] which are previouslydetermined by the user by a linear operation expressed by the followingequation (6), as illustrated in a diagram on the right side of FIG. 4.front=Mfront*h_supp_lev/h_dv_max for h_dv_max>h_supp_levrear=Mrear*h_supp_lev/h_dv_max for h_dv_max>h_supp_lev  (6)

Conversely, when h_dv_max is smaller than h_supp_lev, the distortion ofthe conversion image is within the allowable range. Accordingly, thefollowing equation (7) holds, as illustrated in a drawing on the leftside of FIG. 4:front=Mfront for h_dv_max≦h_supp_levrear=Mrear for h_dv_max≦h_supp_lev  (7)

That is, when h_dv_max is smaller than h_supp_lev, a dynamic rangedv_range (=front–rear) in the phase of the conversion video is equal toa dynamic range Mdv_range (=Mfront–Mrear) in the phase previouslydetermined by the user.

In the distortion suppression processing for suppressing the dynamicrange in a real machine, h_supp_lev is replaced with a unit of anestimated depth in order to reduce a load on a CPU. For convenience,however, description was made using a unit system of pixels.

Description is made of a distance scale conversion method.

In a two-lens stereoscopic display, a parallax W between correspondingpoints of a right eye image (an R image) and a left eye image (an Limage) and a distance Yp from a screen actually viewed to a positionwhere the images are merged together are in a non-linear relationship.

That is, when the R image and the L image which have a parallax W [mm]therebetween on the screen of the display are viewed from a positionspaced a distance K [mm] apart from the screen, the distance Yp [mm]from the screen to the position where the images are merged together isexpressed by the following equation (8):Yp=KW/(W−2E)  (8)

In the foregoing equation (8), variables respectively represent thefollowing values:

K: a distance [mm] from the screen of the display to a viewer

E: a length [mm] which is one-half the distance between the eyes

W: a parallax [mm] between the corresponding points of the left eyeimage and the right eye image on the screen of the display

Yp: a distance [mm] from the screen to the position where the images aremerged together

When the foregoing equation (8) is shown graphically in FIG. 5, lettingK=1000 mm and 2E=65 mm.

FIG. 5 shows that a spatial distortion cannot be prevented fromoccurring in images to be merged together only by linearly replacing adepth estimate with a unit of pixels. In a distance scale method,therefore, the depth estimate is converted into the unit of pixels inconsideration of the spatial distortion.

The distance scale conversion method will be briefly described.

The width of one pixel on the display is taken as U [mm]. When it isassumed that there is a parallax W corresponding to α pixels between thecorresponding points, the parallax W is expressed by the followingequation (9):W=αU  (9)

By substituting the foregoing equation (9) in the foregoing equation(8), the relationship between the pixels and the position where theimages are merged together is found, as expressed by the followingequation (10):Yp=KαU/(αU−2E)  (10)

Furthermore, the foregoing equation (10) is deformed, to obtain thefollowing equation (11):α=2E*Yp/{(Yp−K)U}  (11)

In complete distance scale conversion, when the maximum projectionamount Ymax′ from the screen and the maximum recession amount Ymin′ fromthe screen are designated, if a depth estimate depth (having a valuefrom 0 to 100) is determined, a corresponding depth Yp can be obtainedby simple scale conversion expressed by the following equation (12):Yp=Ymax′−Ymin′)×depth/100  (12)

A parallax a corresponding to Yp is found by the foregoing equation(11). Consequently, the depth estimate can be converted into a unit ofpixels in consideration of the spatial distortion.

In the complete distance scale conversion, when a 256-stage parallaxconversion table W″ is used, the space between Ymax′ and Ymin′ is firstdivided into 256 equal divisions, and a corresponding parallaxconversion table W″ [pixel] is found for each depth Yp on the basis ofthe foregoing equation (11).

In this case, W″ [255] is a parallax corresponding to Ymax′, and W″ [0]is a parallax corresponding to Ymin′. If the depth estimate depth isdetermined, a corresponding parallax α is found from the followingequation (13):α=W″[lev]  (13)

Here, lev indicates the number of stages on the parallax conversiontable, and is expressed by the following equation (14):lev=255×depth/100  (14)

Although description was made of the complete distance scale conversionmethod in the 2D/3D conversion, the method has two problems, describedbelow:

(1) When the maximum projection amount Ymax′ is increased until thedepth Yp is saturated, the distortion of the conversion image itself(the distortions of the R image itself and the L image itself) isincreased in a portion having a depth in the vicinity of Ymax′.

(2) When an attempt to enlarge a dynamic range in a depth reproductionspace is made, there is no alternative but to reduce the maximumrecession amount Ymin′. Accordingly, an area projected forward from thescreen is extremely reduced.

In order to avoid the above-mentioned problem, the conversion must becarried out using only an area where there is some degree ofproportionality between a depth and a parallax. However, this causes thecomplete distortion scale conversion to be approximately the same aspixel scale conversion. Therefore, it is no longer easy to say that thecomplete distance scale conversion is useful because complicatedprocessing is performed.

Therefore, polygonal line distance scale conversion next introduced hasbeen devised. In the polygonal line distance scale conversion, aprojection amount ratio C [%] is introduced, to divide the space fromYmax′ to 0 into 255*C/100 into equal divisions, and to divide the spacefrom 0 to Ymin′ into 255{(1−C)/100}} into equal divisions, therebyfinding a parallax conversion table, as shown in FIG. 7.

That is, the projection amount ratio C is controlled, thereby making itpossible to change a projection amount forward from the screen andsuppress the distortion of the conversion image itself in a portionwhere the projection amount reaches its maximum. In the polygonal linedistance scale conversion, an equation corresponding to the foregoingequation (12) is the following equation (15):Yp=Ymax′×{depth−(100−C)}/C for depth≧(100−C)Yp={−Ymin′×depth/(100−C)}+Ymin′ for depth<C  (15)

Furthermore, an equation corresponding to the foregoing equation (14)representing the number of stages on the parallax conversion table W″ isthe following equation (16):lev=(255−Dlev)×{depth−(100−C)}/C+Dlev for depth≧(100−C)lev=Dlev×depth/(100−C) for depth<(100−C)  (16)Here, Dlev is defined by the following equation (17), and represents thenumber of stages, on the parallax conversion table, corresponding to thescreen:Dlev=(100−C)×255/100  (17)

The polygonal line distance scale conversion is so carried out that nospatial distortion occurs ahead of and behind the screen. Converselyspeaking, a spatial distortion occurs on the screen. This is based onthe hypothesis that the spatial distortion is most difficult tounderstand in the vicinity of the screen from the term “when astereoscopic video is viewed, how the video is seen differs ahead of andbehind a screen” obtained from a lot of viewers.

As values actually employed, Ymax′, Ymin′, and C are determined suchthat the inclination (the step width) of the depth parallax conversiontable does not greatly differs ahead of and behind the screen.

Meanwhile, the above-mentioned distortion suppression processing usingthe linear operation is effective for the pixel scale conversion.However, it cannot be said that it is effective for the distance scaleconversion. The reason for this is that the distance scale conversionhas such properties that the parallax greatly differs ahead of andbehind the screen even if the depth estimate is the same, for example“1” because the depth Yp and the parallax W [pixel] are non-linear. Thistendency becomes significant in a large-screen display. In the polygonalline distance scale which is an improvement of the complete distancescale, the projection amount ratio C is introduced even in the sense oflessening the properties.

Even in the polygonal line distance scale capable of controlling theprojection amount ratio C, however, the maximum value h_dv_max [pixel]of the phase difference between the adjacent areas cannot be completelysuppressed within the distortion allowable range h_supp_lev [pixel] (theprinciple of suppressing a distortion in a pixel scale cannot befaithfully realized). In order to realize the principle of suppressing adistortion, distortion suppression processing must be performed afterthe distance scale conversion.

4. Simultaneous Use of MTD Method and CID Method

Generally, a human being perceives a feeling of distance at the time ofstereoscopic view, for example, by the difference between dead angleportions (occlusion) of images respectively entering his or her rightand left eyes, for example, caused by the difference between thepositions of the right and left eyes. In terms of this, the feeling ofdistance or the like can be covered in the MTD method. On the otherhand, a video which does not move or a video whose movement iscomplicated cannot be satisfactorily converted into a three-dimensionalvideo. In the CID method, a parallax between right and left eye imagescan be freely changed. On the other hand, a human being cannot be showna video as if its dead angle portions serving as a shadow of a subjectwere different depending on the parallax in his or her right and lefteyes.

Therefore, it is considered that 2D/3D conversion is carried outsimultaneously using the MTD method effective for a moving picture andthe CID method capable of also converting a still picture. In this case,it is considered that a parallax obtained by the MTD method and aparallax obtained by the CID method are simply added together.

However, the parallax obtained by the MTD method and the parallaxobtained by the CID method are individually controlled. Accordingly, theparallax produced by the conversion greatly depends on the presence orabsence of the movement of an input video. That is, when the input videois a moving picture, a parallax obtained by the MTD method and aparallax obtained by the CID method are reflected on a conversion video.When it is a still video, however, there is no parallax obtained by theMTD method, and there is only a parallax obtained by the CID method.

Such a phenomenon that a stereoscopic effect of a conversion videogreatly differs depending on an input video is inconvenient when a useradjusts a stereoscopic effect.

An object of the present invention is to provide a method of convertinga two-dimensional video into a three-dimensional video, in which astereoscopic effect of a conversion video can be prevented from greatlydiffering depending on an input video when the two-dimensional video isconverted into the three-dimensional video simultaneously using the MTDmethod and the CID method.

Another object of the present invention is to provide a method ofconverting a two-dimensional video into a three-dimensional video, inwhich the distortion of a conversion image can be suppressed when adepth estimate is converted into a parallax using distance scaleconversion.

DISCLOSURE OF INVENTION

[1] Description of device for converting two-dimensional video intothree-dimensional video according to the present invention

A device for converting a two-dimensional video into a three-dimensionalvideo according to the present invention is characterized by comprisinga field memory for storing for each field a two-dimensional video signalinputted from a video signal source; motion vector detection means fordetecting for each area of an input video a motion vector correspondingto movement between fields of the inputted video signal; readout meansfor reading out, out of the video signals stored in the field memory,the video signal delayed from the inputted video signal by a delay foundfrom the motion vector for each area detected by the motion vectordetection means from the field memory; switching means for outputtingone of the inputted video signal and the video signal read out of thefield memory and the other video signal, respectively, as a left eyevideo signal and a right eye video signal on the basis of the directionof a horizontal component of the motion vector for each area detected bythe motion vector detection means; feature extraction means forextracting for each area of the input video a video feature from theinputted video signal; parallax calculation means for calculating, onthe basis of the image feature for each area of the input videoextracted by the feature extraction means, a depth for the area andcalculating a parallax for the area from the calculated depth for thearea; parallax correction means for correcting the parallax for eacharea calculated by the parallax calculation means depending on themagnitude of the motion vector for the area detected by the motionvector detection means; and phase control means for correcting, on thebasis of the parallax for each area corrected by the parallax correctionmeans, phases for the area of the right eye video and the left eye videooutputted by the switching means, and outputting the videos as astereoscopic video signal.

Used as the parallax correction means is one comprising means forcalculating for each area a difference parallax obtained by subtractingfrom the parallax for the area calculated by the parallax calculationmeans the parallax dependent on the magnitude of the motion vector inthe corresponding area, and means for calculating a difference parallaxfor each area by changing a dynamic range such that the maximum value ofthe difference in the difference parallax between the adjacent areas iswithin a predetermined range.

It is preferable that there is provided means for reducing, when the sumof the difference parallax for each area obtained by the parallaxcorrection means and the parallax dependent on the magnitude of themotion vector in the corresponding area exceeds a predetermined range, adelay by an amount corresponding to the excess parallax.

[2] Description of first method of converting two-dimensional video intothree-dimensional video according to the present invention

A first method of converting a two-dimensional video into athree-dimensional video according to the present invention ischaracterized by comprising a first step of storing for each field atwo-dimensional video signal inputted from a video signal source in afield memory; a second step of detecting for each area of an input videoa motion vector corresponding to movement between fields of the inputtedvideo signal; a third step of reading out, out of the video signalsstored in the field memory, a video signal delayed from the inputtedvideo signal by a delay found from the motion vector for each areadetected at the second step from the field memory; a fourth step ofoutputting one of the inputted video signal and the video signal readout of the field memory and the other video signal, respectively, as aleft eye video signal and a right eye video signal on the basis of thedirection of a horizontal component of the motion vector for each areadetected at the second step; a fifth step of extracting for each area ofthe input video a video feature from the inputted video signal; a sixthstep of calculating, on the basis of the image feature for each area ofthe input video extracted at the fifth step, a depth for the area andcalculating a parallax for the area from the calculated depth for thearea; a seventh step of correcting the parallax for each area calculatedat the sixth step depending on the magnitude of the motion vector forthe area detected at the second step; and an eighth step of correcting,on the basis of the parallax for each area corrected at the seventhstep, phases for the area of the right eye video and the left eye videooutputted at the fourth step, and outputting the videos as astereoscopic video signal.

Used as the seventh step is one comprising the steps of calculating foreach area a difference parallax obtained by subtracting from theparallax for the area calculated at the sixth step the parallaxdependent on the magnitude of the motion vector in the correspondingarea, and calculating a difference parallax for each area by changing adynamic range such that the maximum value of the difference in thedifference parallax between the adjacent areas is within a predeterminedrange.

It is preferable that the method comprises the step of reducing, whenthe sum of the difference parallax for each area obtained at the seventhstep and the parallax dependent on the magnitude of the motion vector inthe corresponding area exceeds a predetermined range, a delay by anamount corresponding to the excess parallax.

[3] Description of second method of converting two-dimensional videointo three-dimensional video according to the present invention

A second method of converting a two-dimensional video into athree-dimensional video according to the present invention ischaracterized by comprising a first step of extracting an image featurerelated to the long or short distance of a video from each of aplurality of parallax calculation areas set within one screen on thebasis of a two-dimensional video signal, and producing a depth estimatefor the parallax calculation area on the basis of the extracted imagefeature; a second step of subjecting each of the depth estimates todistance scale conversion using a dynamic range defined by apredetermined maximum projection amount and a predetermined maximumrecession amount, to find a tentative target phase for each of theparallax calculation areas; a third step of finding the maximum value ofa phase difference between the adjacent parallax calculation areas onthe basis of the tentative target phase for each of the parallaxcalculation areas; a fourth step of judging whether or not the maximumvalue of the phase difference between the adjacent parallax calculationareas is within a predetermined distortion allowable range; and a fifthstep of searching, when the maximum value of the phase differencebetween the adjacent parallax calculation areas is outside thepredetermined distortion allowable range, for such a dynamic range thatthe phase difference between the parallax calculation areas is withinthe distortion allowable range, subjecting each of the depth estimatesto distance scale conversion using the dynamic range searched for, andfinding a tentative target phase for each of the parallax calculationareas, to proceed to the third step.

Here, the distance scale conversion means a method of converting a depthestimate into a unit of pixels (a parallax) in consideration of aportion where images are merged together. Contrary to this, a method oflinearly converting a depth estimate into a unit of pixels (a parallax)is referred to as pixel scale conversion.

At the fifth step, the dynamic range searched for may be corrected suchthat the ratio of the maximum projection amount to the maximum recessionamount, which are defined by the dynamic range, is a predeterminedratio, and each of the depth estimates is then subjected to distancescale conversion using the corrected dynamic range.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a schematic view for explaining the conventional MTD method.

FIG. 2 is a schematic view for explaining subject position control.

FIG. 3 is a flow chart showing the procedure for control processing inthe conventional CID method.

FIG. 4 is a schematic view for explaining dynamic range suppressionprocessing in the conventional CID method.

FIG. 5 is a graph showing the relationship between a parallax W betweenimages and a portion where the images are merged together Vp.

FIG. 6 is a graph for explaining complete distance scale conversion.

FIG. 7 is a graph for explaining polygonal line distance scaleconversion.

FIG. 8 is a graph for showing such properties that a parallax W [pixel]greatly differs ahead of and behind a screen even if a depth estimate isthe same, for example, “1” because a depth Yp and the parallax arenon-linear.

FIG. 9 is a flow chart showing the procedure for control in the CIDmethod according to a first embodiment of the present invention.

FIG. 10 is a flow chart showing distance scale conversion and distortionsuppression processing at the step 13 shown in FIG. 9.

FIG. 11 is a graph showing that a depth relationship ahead of and behinda screen even if a dynamic range is changed by introducing a method ofmaintaining a distance ratio.

FIG. 12 is a schematic view showing a case where polygonal line distancescale conversion for only maintaining an amount ratio is carried out anda case where processing for maintaining a distance ratio is performed.

FIG. 13 is a diagram showing the schematic configuration of a 2D/3Dconverting device according to a second embodiment.

FIG. 14 is a flow chart showing the operations of the 2D/3D convertingdevice shown in FIG. 1.

FIG. 15 is a diagram showing the schematic configuration of a 2D/3Dconverting device according to a second embodiment.

FIG. 16 is a flow chart showing the procedure for overall integratedphase control processing.

FIG. 17 is a schematic view showing overall behavior in integrated phasecontrol.

FIG. 18 is a schematic view showing behavior at each area at the time ofintegrated phase control.

FIG. 19 is a flow chart showing the detailed procedure for processing atsteps 53, 54, and 55 shown in FIG. 16.

FIG. 20 is a schematic view showing an example of distortion suppressionperformed in integrated phase control.

FIG. 21 is a schematic view showing a case where a phase must beadjusted by the MTD method and a case where it need not be adjusted.

FIG. 22 is a schematic view for explaining phase suppression processingby the MTD method.

FIG. 23 is a flow chart showing the procedure for delay suppressionprocessing.

BEST MODE FOR CARRYING OUT THE INVENTION [1] Description of FirstEmbodiment

Referring now to FIGS. 9 to 12, a first embodiment of the presentinvention will be described.

FIG. 9 shows the procedure for control in the CID method according tothe first embodiment of the present invention.

First, one screen is divided into a plurality of areas, and informationrelated to a high frequency and a contrast of luminance, and achrominance (B-Y, R-Y) component are obtained from each of the areas(step 11). A depth estimate for each of the areas, which has beenestimated from the information and a composition, is found (step 12).The found depth estimate is subjected to distance scale conversion anddistortion suppression processing, thereby obtaining a target phase(step 13).

FIG. 10 shows the details of the distance scale conversion anddistortion suppression processing at the step 13 shown in FIG. 9.

First, the depth estimate is subjected to the distance scale conversionin a dynamic range defined by Mfront and Mrear, to obtain a tentativetarget phase (steps 21 and 22) The maximum value h_dv_max [pixel] of aphase difference between the adjacent areas is calculated on the basisof the obtained tentative target phase (step 23).

It is judged whether or not the maximum value h_dv_max [pixel] of thephase difference between the adjacent areas is within a distortionallowable range h_supp_lev [pixel] (step 24). When it is within theallowable range, the tentative target phase is taken as a true targetphase (step 27).

When the maximum value of the phase difference between the adjacentareas is outside the distortion allowable range, a dynamic range definedby Mfront and Mrear is gradually reduced until the maximum value of thephase difference is not more than h_supp_lev, thereby obtaining the mostsuitable values of front and rear (step 25). For convenience, theprocessing at the step 25 shall be referred to as sequential searchprocessing. The details of the sequential search processing will bedescribed later.

After front and rear are changed such that the distance ratio of frontto rear, which are found by the sequential search processing, is adistance ratio designated by a user (step 26), the program is returnedto the step 22. At the step 22, distance scale conversion is furthercarried out.

The processing at the steps 22, 23, 24, 25, and 26 is repeated until themaximum value h_dv_max [pixel] of the phase difference between theadjacent areas is within the distortion allowable range h_supp_lev[pixel], to obtain a final target phase. The distance scale conversionis carried out every time the dynamic range is changed in order toaccurately realize such a distance scale principle that a viewer is madeto perceive a stereoscopic video conforming to the depth estimateindependently of a spatial distortion in a stereoscopic display.

The sequential search processing will be then described.

In a distance scale where a depth estimate and a phase are non-linear,the dynamic range defined by the values of front and rear is determinedby the sequential search processing in order to increase the dynamicrange.

In the sequential search processing, a parallax can be also calculatedusing a depth parallax conversion equation (an equation 15). However,calculation using a parallax conversion table W″ previously calculatedin the following manner is more efficient. This method will bedescribed, taking a depth estimate at a screen level in a case where itis standardized between 0 and 100 as surface_depth (=100−C).

When the numbers of stages, on the parallax conversion table W″,corresponding to the front value and the rear value are respectivelytaken as Max_lev (=255˜Dlev) and Min_lev (=Dlev˜0), the number of stageslev, on the parallax conversion table, corresponding to a certain depthestimate v_depth is expressed by the following equation (18):lev=(v_depth−surface_depth)*(Max_lev−Dlev)/(100−surface_depth)+Dlev forv_depth>surface_depthlev=(v_depth−Min_lev)*(Dlev−0)/(surface_depth−Min_lev) forv_depth<surface_depthlev=Dlev for v_depth=surface_depth  (18)

A phase phase corresponding to lev can be expressed by the followingequation (19) because it is uniquely found by the parallax conversiontable W″.phase=W″(lev)  (19)

In the sequential search processing, the front value and the rear valueat which the maximum phase difference between the adjacent areas is notmore than h_supp_lev can be found out by gradually changing Max_lev andMin_lev.

As apparent from the foregoing equation (18), in the sequential searchprocessing, methods of searching for the most suitable values of frontand rear in accordance with the relationship between phases in theadjacent two areas between which there is a maximum phase difference areof the following three types:

First case: when both the areas respectively have phases ahead of thescreen, the front value is brought near to zero (Max_lev is brought nearto Dlev).

Second case: when both the areas respectively have phases behind thescreen, the rear value is brought near to zero (Min_lev is brought nearto Dlev).

Third case: when one of the areas has a phase ahead of the screen, andthe other area has a phase behind the screen, both the front value andthe rear value are brought near to zero (Max_lev and Min_lev are broughtnear to Dlev).

In the third case, Max_lev and Min_lev are brought near to Dlev so as tohold the distance ratio previously designated by the user, that is, soas to hold a relationship expressed by the following equation (20) atthe time of the sequential search processing:(255−Dlev): Dlev=(Max_lev−Dlev): (Dlev−Min_lev)  (20)The foregoing equation (20) is synonymous with the processing at thestep 26 shown in FIG. 10. In the first case and the second case,processing for changing the distance ratio is performed not at the timeof the sequential search processing but at the step 26 shown in FIG. 10in order to reduce the amount of operation.

A method of maintaining the distance ratio is introduced in the distancescale in order to hold a depth relationship ahead of and behind thescreen even if the dynamic range is changed. Specifically, as shown inFIG. 11, when a distance ahead of the screen is reduced by 20% in thefirst case, a distance behind the screen is also reduced by 20%, tomaintain the relationship ahead of and behind the screen.

When the distance ratio is maintained on the parallax conversion table,a projection amount ratio C can be also maintained. Consequently, aconversion video having no uncomfortable feeling can be presented to aviewer who tends to recognize a space as a relatively depth.

However, there is a case where the video is better when the overalldynamic range is widened depending on the characteristics of the eyes ofthe viewer. In such a case, not the distance ratio but the amount ratiois only maintained in the first case and the second case.

Fit. 12 illustrates a case where polygonal line distance scaleconversion for only maintaining the amount ratio is carried out and acase where processing for maintaining the distance ratio is furtherperformed. In the polygonal line distance scale conversion for onlymaintaining the amount ratio, a correspondence between a depth estimateon the screen and the parallax conversion table is established bycarrying out separate range conversions with the depth estimate as itsboundary. In a case where the distance ratio is maintained, the depthestimate and the parallax conversion table can correspond to each otherby one range conversion. A function lev (phase) in FIG. 12 represents aninverse function of the foregoing equation (19), and means that thenumber of stages on the parallax conversion table is found from a phasephase [pixel].

[2] Description of Second Embodiment

Referring now to FIGS. 13 and 14, a second embodiment of the presentinvention will be described.

In FIG. 13, reference numeral 1 denotes a video supply source serving asvideo signal supply means such as a VTR (Video Table Recorder), a CD-ROM(Compact Disc Read-Only Memory), or TV broadcasting, reference numeral 2denotes a 2D/3D converting device for converting a two-dimensional videosignal supplied from the video supply source 1 into a three-dimensionalvideo signal, that is, a left eye video signal L and a right eye videosignal R, and reference numeral 3 denotes stereoscopic display meansusing an image splitter system or the like for displaying thethree-dimensional video signal outputted from the 2D/3D convertingdevice 2.

Description is made of the configuration of the 2D/3D converting device2.

Reference numeral 4 denotes a field memory storing as a video the videosignal supplied from the video supply source 1 for each field, andreference numeral 5 denotes motion vector detection means for detectinga motion vector from the video signal supplied from the video supplysource 1.

Reference numeral 6 denotes chrominance extraction means for extractinga chrominance component from the video supplied from the video supplysource 1, reference numeral 7 denotes contrast extraction means forextracting a contrast from the video supplied from the video supplysource 1, and reference numeral 8 denotes high-frequency componentextraction means for extracting a high-frequency component from thevideo supplied from the video signal source 1. The chrominanceextraction means 6, the contrast extraction means 7, and thehigh-frequency component extraction means 8 constitute image featureextraction means.

Reference numeral 9 denotes movement amount calculation means forfinding from the motion vector detected by the motion vector detectionmeans the direction of movement and the amount of movement (themagnitude of the motion vector), and outputting the direction and amountof movement, reference numeral 10 denotes memory control means forreading out from the field memory 4 an image (a delayed image) delayedby the number of fields corresponding to the amount of movementoutputted from the movement amount calculation means 9 using the currentimage as a basis, and reference numeral 11 denotes switching means forswitching as to which of the reference image (the current image) and thedelayed image should be outputted as a left eye video signal L or aright eye video signal R on the basis of the direction of movementoutputted from the movement amount calculation means 9.

Reference numeral 12 denotes grouping means for grouping areas byportions which can be judged to be the same object such as a subject ora background depending on the chrominance component extracted by thechrominance extraction means 6 and the amount and direction of movementcalculated by the movement amount calculation means 9 from the videosupplied from the video supply source 1, and outputting informationrelated to the grouping, and reference numeral 13 denotes first depthmap production means for calculating depth information from the amountof movement calculated by the movement amount calculation means 9 andthe grouping information obtained by the grouping means 12, to produce adepth map.

Reference numeral 14 denotes second depth map production means forcalculating depth information from information related to the contrastextracted by the contrast extraction means 7 for the areas obtained bythe grouping in the grouping means 12, to produce a depth map, referencenumeral 15 denotes third depth map production means for calculatingdepth information from information related to the high-frequencycomponent extracted by the high-frequency component extraction means 8for the areas obtained by the grouping in the grouping means 12, toproduce a depth map, and reference numeral 16 denotes fourth depth mapproduction means for calculating depth information from informationrelated to a composition previously set and information related to theareas obtained by the grouping in the grouping means 12, to produce adepth map.

Reference numeral 17 denotes composite map production means forweighting, adding, and synthesizing the depth maps produced by the depthmap production means 13, 14, 15, and 16, to produce a composite map,reference numeral 18 denotes parallax calculation means for calculatinga parallax for each parallax calculation area previously set from thecomposite map produced by the composite map production means 17, andreference numeral 19 denotes horizontal position setting means forshifting left and right eye images outputted from the switching means 11in the horizontal direction in units of pixels, for example, tosynthesize the images on the basis of the parallax for each parallaxcalculation area calculated by the parallax calculation means 18.

The parallax calculation means 18 considers an amount of frame delay onthe basis of the amount of movement calculated by the movement amountcalculation means 9, thereby correcting, that is, reducing the parallaxoutputted to the horizontal position setting means 19 depending on theamount of movement.

FIG. 14 shows the operations of the 2D/3D converting device 2.

A video signal supplied from the video supply source 1 is stored in thefield memory 4 for each field (step 31). A motion vector is detectedfrom the two-dimensional video signal supplied from the video supplysource 1 by the motion vector detection means 5, and the amount ofmovement and the direction of movement of the motion vector arecalculated by the movement amount calculation means 9 (step 32).Specifically, the motion vector detection means 5 compares an image inthe current field with an image which is one field preceding the currentfield, to extract as a motion vector the amount of movement and thedirection of movement of a subject in the image.

An image (a delayed image) delayed by a predetermined number of fieldsfrom the two-dimensional video signal (the reference image) from thevideo supply source 1 is then read out of the field memory 4 and is fedto the switching means 11 depending on the amount of movement of themotion vector detected at the step 32 (step 33). The switching means 11outputs one of the reference image and the delayed image as a left eyevideo signal L and outputs the other image as a right eye video signal Ron the basis of the direction of movement of the motion vector detectedat the step 32.

The operations at the foregoing steps 31 to 33 correspond to operationsin the MTD method.

Image features are then extracted on the basis of the two-dimensionalvideo signal from the video supply source 1 (step 34). An image areacorresponding to one field is divided into a plurality of areas, so thata plurality of image feature detection areas are set within the imagearea corresponding to one field. The chrominance extraction means 6extracts chrominance information for each of the image feature detectionareas. The contrast extraction means 7 extracts a contrast for each ofthe image feature detection areas. The high-frequency extraction means 8extracts a high-frequency component for each of the image featuredetection areas. Further, the grouping means 12 groups the areas in theimage on the basis of the chrominance information for each of the imagefeature detection areas extracted by the chrominance extraction means 6and the amount of movement detected at the step 32 in order to use theareas for judging a subject, a background, or the like.

A depth map is then produced (step 35). That is, the first depth mapproduction means 13 produces a first depth map on the basis of theamount of movement of the motion vector calculated by the movementamount calculation means 9 and the grouping information obtained by thegrouping means 12.

The second depth map production means 14 produces a second depth map onthe basis of the contrast for each of the image feature detection areasextracted by the contrast extraction means 7 and the groupinginformation obtained by the grouping means 12. The third depth mapproduction means 15 produces a third depth map on the basis of thehigh-frequency component for each of the image feature detection areasextracted by the high-frequency extraction means 8 and the groupinginformation obtained by the grouping means 12.

Furthermore, the fourth depth map production means 16 produces a fourthdepth map on the basis of a composition on a screen previously set (forexample, such a composition that if the screen is mainly composed of alandscape, a lower portion of the screen is the ground, an upper portionof the screen is the sky, and the center of the screen is a subject) andthe grouping information obtained by the grouping means 12.

A composite depth map is then produced (step 36). That is, the compositemap production means 17 weights and adds the first to fourth depth mapsproduced by the first to fourth depth map production means 13, 14, 15,and 16, to produce the composite depth map.

A parallax is then calculated (step 37). That is, the parallaxcalculation means 18 calculates a parallax between a left eye image anda right eye image for each parallax calculation area previouslydetermined on the basis of the composite depth map produced by thecomposite map production means 17.

The foregoing steps 34 and 37 correspond to operations in the CIDmethod.

The parallax is then corrected (step 38). That is, the parallaxcalculation means 18 corrects the parallax for each of the parallaxcalculation areas calculated at the step 37 depending on the amount ofmovement of the motion vector calculated by the movement amountcalculation means 9. Specifically, the parallax calculated at the step37 is reduced by a parallax corresponding to the delay of the delayedimage from the reference image.

The right eye image L and the left eye image R are then horizontallyshifted depending on the parallax after the correction (step 39). Thatis, the horizontal position setting means 19 horizontally shifts theleft eye image L and the right eye image R outputted from the switchingmeans 11 for each pixel, for example, on the basis of the parallaxcorrected at the step 38.

The left eye image L and the right eye image R which have beenhorizontally shifted by the horizontal position setting means 19 aredisplayed by the three-dimensional display means 3 (step 40).

[3] Description of Third Embodiment

Referring now to FIGS. 15 and 23, a third embodiment of the presentinvention will be described.

FIG. 15 illustrates the configuration of a device for converting atwo-dimensional video into a three-dimensional video (a 2D/3D convertingdevice).

In FIG. 15, reference numeral 101 denotes a video supply source servingas video signal supply means such as a VTR, a CD-ROM, or TVbroadcasting, reference numeral 102 denotes a 2D/3D converting devicefor converting the two-dimensional video signal supplied from the videosupply source 1 into a three-dimensional video signal, that is, a lefteye video signal L and a right eye video signal R, and reference numeral103 denotes stereoscopic display means using an image splitter system orthe like for displaying a three-dimensional video signal outputted fromthe 2D/3D converting device 102.

Description is made of the configuration of the 2D/3D converting device102.

Reference numeral 104 denotes a field memory storing the video signalfrom the video supply source 101 for each field, and reference numeral105 denotes motion vector detection means for detecting a motion vectorfrom the video signal supplied from the video supply source 101.

Reference numeral 106 denotes image feature extraction means forextracting for each area image features such as a chrominance component,a contrast, and a high-frequency component from a video supplied fromthe video supply source 101.

Reference numeral 110 denotes delay calculation means for calculating adelay from the motion vector detected by the motion vector detectionmeans 105. Reference numeral 107 denotes memory control means forreading out from the field memory 104 an image (a delayed image) whichis delayed by the number of fields corresponding to the delay calculatedby the delay calculation means 110 using the current input image as abasis. Reference numeral 108 denotes switching means for switching as towhich of the input image and the delayed image should be taken as a lefteye video signal L or a right eye video signal R on the basis of thedirection of movement outputted from the delay calculation means 110.

Reference numeral 109 denotes depth estimate calculation means forcalculating, on the basis of the image features for each area extractedby the image feature extraction means 106, a depth estimate for thearea. Reference numeral 111 denotes parallax calculation means forcalculating, on the basis of the depth estimate for each area calculatedby the depth estimate calculation means 109, a parallax (a phase)produced by the CID method for the area, and correcting, on the basis ofa parallax produced by the MTD method which is outputted from the delaycalculation means 110, the parallax produced by the CID method, tocalculate an integrated parallax (an integrated phase).

Reference numeral 113 denotes stereoscopic video synthesis means forshifting respective areas (for example, pixel units) in right and lefteye images outputted from the switching means 108 in the horizontaldirection and synthesizing the areas on the basis of the integratedparallax calculated by the parallax calculation means 111.

Reference numeral 112 denotes parallax monitoring means for controllinga delay on the basis of the integrated parallax calculated by theparallax calculation means 111 and the parallax produced by the MTDmethod which is outputted from the delay calculation means 110.

In the present embodiment, a stereoscopic space is reproduced, takingthe depth estimate obtained by the CID method as a base. That is, astereoscopic video obtained by adding occlusion produced by the MTDmethod to the CID method is presented. As a specific method, a phase (aparallax: a phase produced as a result by a field delay) calculated bythe MTD method is subtracted from a phase (a parallax) for each areacalculated by the CID method so that the phase for the area is equal tothe phase calculated by the CID method even after the MRD method and theCID method are simultaneously used. Therefore, the phases produced bythe MTD method and the CID method are controlled by the followingpriorities:

Priority 1 the maximum range Urange [pixel] of a phase set by a user

Priority 2 the limit of an image distortion h_supp_lev [pixel] caused byphase shift in a conversion image

Priority 3 a depth shape (the shape of a depth estimate) estimated bythe CID method

Priority 4 a phase dly_sisa [pixel], produced by the MTD method, whichdoes not exceed Urange

Priority 5 a phase [pixel] produced by the CID method

Description is now made of the meanings of the priorities.

The highest priority 1 ensures that the integrated phase does not exceedthe maximum range Urange of the phase set by the user.

The priority 2 ensures that the distortion of an image produced byintegrated phase control (particularly, the CID method) is within agiven limit value (within h_supp_lev).

The priority 3 means that the depth estimate (depth shape) for each areacalculated by the CID method is maintained even after the MTD method andthe CID method are simultaneously used.

The priority 4 ensures that the parallax produced by the MTD method doesnot exceed Urange.

The lowest priority 5 means that the phase produced by the CID methodtakes a value different from the phase produced by the CID method aloneby using the CID method together with the MTD method.

FIG. 16 shows the procedure for integrated phase control processing witha depth shape estimated by the CID method maintained. FIG. 17illustrates the behavior of a phase at the time of execution.

Image features are first extracted by the image feature extraction means106 (step 51). The depth estimate calculation means 109 estimates adepth by the CID method on the basis of the image features extracted bythe image feature extraction means 106 (step 52). That is, a calculatedfrequency, a contrast, a weight on a composition, and a weight on theresults of subject/background judgment are added together at a suitableratio, to find a depth estimate.

In the integrated phase control, the CID method is also used for amoving picture. Accordingly, the addition ratio is made variabledepending on the speed of the movement in a video. Specifically, inorder to compensate for the tendency of the value of a high-frequencycomponent to decrease by rapid movement, the addition ratio of thehigh-frequency component is reduced as the speed of the movementincreases.

The depth estimate thus found is subjected to distance scale conversion(complete distance scale conversion or polygonal line distance scaleconversion) within Ufront and Urear, to find a phase for each areaproduced by the CID method (step 53). A difference phase is found bysubtracting a phase produced by the MTD method (MTD phase) (=a fielddelay×the value of a horizontal motion vector in the area) from thephase produced by the CID method (CID phase) (step 54). The differencephase is subjected to distortion suppression processing such that thedifference between the phases for the adjacent areas is not more thanh_supp_lev [pixel] (step 55).

The reason why a right end of the MTD phase and a left end of the CIDphase are overlapped with each other in the difference phase (=CIDphase−MTD phase) shown in FIG. 17 is that the phases produced by boththe methods differ for each area. This is apparent from the behavior ofa difference phase ph_diffj (=ph_cidj−ph_mtdj) for the area obtained bysubtracting a phase ph_mtdj for each area produced by the MTD methodfrom a phase ph_cidj for the area produced by the CID method, as shownin FIG. 18. j indicates an area number.

In FIG. 18, values on three columns and four rows on the upper siderespectively indicate phases in respective areas, and the phases in therespective areas are made visually understandable by being arranged in asingle row on the lower side.

The phase after the distortion suppression processing shown in FIG. 17indicates that the difference phase is subjected to the distortionsuppression processing. The maximum projection phase front [pixel] andthe maximum recession phase urear [pixel] of a phase obtained byintegrating the MTD method and the CID method (an integrated phase)after the distortion suppression processing are found by loop processingshown in FIG. 19.

FIG. 19 shows the details of the processing at the steps 53, 54, and 55shown in FIG. 16. The processing is performed by the parallaxcalculation means 111.

Ufront and Urear set by the user are first respectively set to variablesUfront′ and Urear′ (step 61), and are then subjected to distance scaleconversion in a dynamic range defined by Ufront′ and Urear′, to obtain aCID phase (step 62). A tentative difference phase obtained bysubtracting an MTD phase from the CID phase is then found (step 63). Themaximum value h_dv_max [pixel] of a phase difference between theadjacent areas found from the tentative difference phase (the maximumvalue of the difference in the phase difference between the adjacentareas) is found (step 64). The program then proceeds to the step 65.

When the maximum value h_dv_max [pixel] of the phase difference betweenthe adjacent areas is not within a distortion allowable range h_supp_lev[pixel], as described later, the dynamic range is reduced such that thephase difference between the adjacent areas is within the distortionallowable range. Thereafter, the processing at the foregoing steps 62,63, and 64 is performed again.

It is judged at the step 65 whether or not in a case where such loopprocessing is performed, the maximum value h_dv_max [pixel] of a phasedifference calculated at the previous step 64 is smaller than themaximum value h_dv_max [pixel] of the phase difference calculated at thecurrent step 64.

At the time point where the loop processing is not performed, the answeris in the negative at the step 65, to judge whether or not the maximumvalue h_dv_max [pixel] of the phase difference calculated at the currentstep 64 is within the distortion allowable range h_supp_lev [pixel](step 66). If it is within the range, the tentative difference phase istaken as a true target phase (step 72).

Conversely, if it is outside the range, it is judged whether or not thenumber of loops is within the limit number of loops in order to reduce aload on a CPU (step 67). When the number of loops is larger than thelimit number of loops, the tentative difference phase is subjected toforced distortion suppression processing, described later, to find atrue target phase (step 73).

Furthermore, when the number of loops is smaller than the limit numberof loops, the tentative difference phase is caused to retreat (step 68),and the dynamic range defined by Ufront′ and Urear′ is gradually reduceduntil the phase difference between the adjacent areas is not more thanh_supp_lev, to obtain the most suitable values of ufront and urear (step69). The processing shall be hereinafter referred to as sequentialsearch processing. The details of the sequential search processing willbe described later.

The distance ratio of ufront to urear, which are found in the sequentialsearch processing, is changed into the distance ratio designated by theuser (step 70). The obtained ufront and urear are set to Ufront′ andUrear′, to change the dynamic range (step 71). Thereafter, the programis returned to the step 62. At the step 62, the distance scaleconversion is carried out again.

A series of processing at the step 62 to the step 71 is repeated untilthe maximum value h_dv_max [pixel] of the phase difference between theadjacent areas is within the distortion allowable range h_supp_lev[pixel], and the processing is interrupted halfway, to obtain a finaltarget phase.

Two types of interruption of loop processing in FIG. 19 will besuccessively described.

First, the first interruption occurs when the number of loops reachesthe limit number of loops in order to reduce the CPU load at the step67. When the interruption occurs by the conditions, a tentativedifference phase is subjected to distortion suppression processing in apixel scale manner which is synonymous with the following equation (6),as shown in the following equation (21), to determine the values ofufront and urear, thereby range-converting a depth estimate within arange defined thereby.u front=df_ufront*h_supp_lev/h_dv_max for h_dv_max>h_supp_levurear=df_urear*h_supp_lev/h_dv_max for h_dv_max>h_supp_lev  (21)

Here, df_ufront and df_urear respectively indicate the maximum value andthe minimum value of the tentative difference phase, which shall beacquired at the step of calculating the maximum value of the differencein the phase difference between the adjacent areas. In such a way, thedifference phase shall be accommodated within a range newly found. Thereis no problem even if ufront and urear in the foregoing equation (21)are subjected to distance ratio maintenance processing expressed by thefollowing equation (22):(255−Dlev): Dlev={lev(ufront)−Dlev}: {Dlev−lev(urear)}  (22)

The second interruption occurs when in a case where the loop processingat the step 62 to the step 71 is performed at the step 65, the maximumvalue h_dv_max [pixel] of the phase difference calculated at theprevious step 64 is smaller than the maximum value h_dv_max [pixel] ofthe phase difference calculated at the current step 64.

The interruption occurs when the maximum value h_dv_max of the phasedifference between the adjacent areas in the current loop is not smallerthan a value obtained in the previous loop irrespective of the fact thatthe dynamic range is sufficiently reduced. This occurs by the fact thata phase produced by the MTD method is not changed by the distortionsuppression processing. That is, when the difference between the phasesof a subject and a background which are produced by the MTD method islarge, the dynamic range is not reduced upon being interrupted by thephase difference between the MTD phases even if the number of loops isincreased. As a result, the phase difference is not more thanh_supp_lev.

In such a case, the processing is interrupted, to change the dynamicrange by the same processing as that at the step 73 (step 74). However,in this case, the dynamic range is changed with respect to the tentativeretreat difference phase which is caused to retreat at the step 68. Thedynamic range is changed with respect to the tentative retreatdifference phase in order to reduce the tendency of the phase producedby the MTD method to affect the shape of the difference phase every timedistortion suppression loops are overlapped with each other and of thedynamic range to reduce with respect to the difference phase.

However, such a method is symptomatic treatment. Accordingly, thefrequency of occurrence of forced distortion suppression processingcaused by the phase produced by the MTD method cannot be fundamentallyreduced.

In order to reduce the frequency of occurrence of such a phenomenon, thephase difference in the MTD phase itself between the adjacent areas mustbe reduced. In the integrated phase control, therefore, used as the MTDphase ph_mtdj for each area is a value after smoothing a parallax whichthe area integrally has (=a field delay×the value of a horizontal motionvector in the area) together with a parallax between the adjacent areas.

In order to reduce the frequency of occurrence of the forced distortionsuppression processing, the shapes of the phases respectively producedby the MTD method and the CID method must be similar to each other.Therefore, the depth estimation is performed in consideration of theresults of subject/background judgment such that the CID phase is alsoincreased in an area where the MTD phase is increased, as shown in FIG.16, in the integrated phase.

The sequential search processing using the dynamic range at the step 69shown in FIG. 19 will be described.

{circle around (1)} Adjacent areas between which there is a maximumphase difference in difference phase is determined.

{circle around (2)} The direction of search is determined. Specifically,the direction of search is determined depending on respective CID phasesin the two areas, between which there is a maximum phase difference,determined in {circle around (1)}.

{circle around (3)} The values of ufront and urear are brought near tothe value of a screen.

{circle around (4)} The two areas are subjected to distance scaleconversion using a dynamic range defined by the values of ufront andurear which have been updated, to calculate the CID phases in the twoareas.

{circle around (5)} A difference phase (=CID phase−MTD phase) in each ofthe two areas is calculated.

{circle around (6)} A phase difference h_dv_max in the difference phasebetween both the areas is calculated.

{circle around (7)} The phase difference h_dv_max found in {circlearound (6)} is judged in the following order:

-   -   1) When h_dv_max is not more than h_supp_lev, the processing is        terminated.    -   2) When h_dv_max is more than h_v_max in the previous loop, the        processing is terminated, taking a value to be found as the        value of ufront or urear used in the previous loop.    -   3) When h_dv_max is more than h_supp_lev, the program is judged        to {circle around (3)}.

A method of controlling a parallax (a phase) produced by the MTD method,which is carried out by the parallax monitoring means 112, will be thendescribed.

In integrated phase control with stereoscopic reproducibility in the CIDmethod maintained, subject position control is not used in the MTDmethod. Therefore, the phase produced by the MTD method may, in somecases, exceed the maximum projection phase Ufront [pixel] and themaximum recession phase Urear [pixel] which are previously determined bya user. The behavior of the phase in a case where such a phenomenonoccurs is illustrated in FIG. 21. An OK mark and an NG mark at a rightend in FIG. 21 respectively indicate that an integrated phase which isthe sum of the MTD phase and the difference phase is within a dynamicrange Urange previously determined by the user and that the integratedphase exceeds the dynamic range Urange.

In an NG case, the following problem arises.

When Urear is approximately the same as the distance between the eyes, adepth which is not less than the distance between the eyes cannot bedefined in a distance scale. Further, when an NG phenomenon ismaintained even after distortion suppression processing, the integratedphase cannot maintain the principle of stereoscopic reproduction withinUrange which is its major premise.

In order to solve such a problem, a parallax Mdly_sisa for determining astereoscopic effect produced by the MTD method can be previously set toa small value so that no NG phenomenon occurs. However, it is difficultto say that this method is preferable because the stereoscopic effectproduced by the MTD method is lost. Therefore, such control that theoccurrence of the NG phenomenon is recognized to some extent ascompensation for increasing Mdly_sisa to reduce a target delaydly_target only when a phase exceeding Ufront and Urear is produced isrequired (see FIG. 22).

In order to suppress the phase within Urange in this method, valueswhose absolute values are smaller than the respective values must beprocessed as internal Ufront and Urear in anticipation of a surplus ofthe phase over Urange which is produced at the time of simultaneouslyusing the MTD method in place of Ufront and Urear which are values setby the user from the beginning. Further, in a method of converting thedistance scale using a parallax conversion table, a phase outside theconversion table must be so rounded as to be accommodated in theconversion table.

FIG. 23 shows the procedure for control processing (the procedure forcontrol processing performed by the parallax monitoring means 112) forrealizing the processing shown in FIG. 22.

In FIG. 23, a target delay is reduced when an integrated phase for eacharea (the sum of a real phase and a phase by a real delay) exceedsUfront and Urear.

Therefore, respective phases produced by the MTD method in a subjectarea and a background area must be calculated for each field (step 81).The phase in the current field is calculated using a real phase phase[pixel] and a real delay delay [field] in order to improve precision.

In actual control, a real parallax obj_sisa′ [pixel] in the subject areaand a real parallax bg_sisa′ [pixel] in the background area which areproduced by a field delay in the MTD method, and a real parallaxng_sisa′ [pixel] in an NG area (it is not clear to which of the areasthe NG area belongs) are found by the following equation (23):obj_sisa′=obj_vect*delaybg_sisa′=bg_vect*delayng_sisa′=(obj_sisa′+bg_sisa′)/2  (23)

As expressed by the following equation (24), the real parallax in eachof the areas and a real phase rph_diffj [pixel] obtained by smoothing atrue target phase in the area on a time basis are added together,thereby finding a real integrated phase u_phasej [pixel] in the area(step 82):u_phasej=obj_sisa′+ph_diffj for subject areau_phasej=bg_sisa′+ph_diffj for background areau_phasej=ng_sisa′+ph_diffj for NG area   (24)

In order to determine whether the real integrated phase u_phase iswithin a range from Ufront to Urear set by the user, a phase over_phase[pixel] indicating how a phase in the area is spaced apart from that setby the user is found by the following equation (25) when the realintegrated phase is outside the range (step 83):over_phasej=u_phasej−Ufront for u_phasej>Ufrontover_phasej=−(u_phasej−Urear) for Urear>u_phasejover_phasej=0 for Ufront≧u_phasej≧Urear   (25)

The maximum value over map [pixel] of the phase over_phase in each ofareas constituting one screen is then found, to perform target delaysuppression processing for reducing a target delay when over_maxp is notzero (step 84).

In the target delay suppression processing, the phase over_maxp found inthe foregoing equation (25) is first subtracted from the absolute valueof the real parallax dly_sisa′ [pixel] in the current field which isproduced by a field delay, to find a parallax dly_sisa″ which can beproduced by the MTD method by the following equation (26):dly_sisa″=|dly_sisa′|−over_maxp=|obj_sisa′−bg_sisa′|−over_maxp  (26)

The target delay dly_target′ which is suppressed on the basis of theparallax dly_sisa″ is found by the following equation (27):dly_target′=dly_sisa″/(obj_xvec−bg_xvec) [field]  (27)

A method in which the transition speed of a real delay changes dependingon the difference between the real delay and a target delay is provided,to compare the target delay dly_target′ and the target delay dly_targetbefore the suppression, and take the smaller one of them as the finaltarget delay dly_target″ after the suppression. That is, the finaltarget delay dly_target″ after the suppression is expressed by thefollowing equation (28):dly_target″=delay−1 for 0<delay<dly_target′dly_target″=delay+1 for 0>delay>dly_target′dly_target″=dly_target′for |delay|>|dly_target′|  (28)

Although the phase produced by the MTD method was suppressed by the realdelay and the real parallax, it can be also suppressed by the targetphase and the target delay when a load on a CPU is given priority overprecision.

1. A device for converting a two-dimensional video into athree-dimensional video, characterized by comprising: a field memory forstoring for each field a two-dimensional video signal inputted from avideo signal source; motion vector detection means for detecting foreach area of an input video a motion vector corresponding to movementbetween fields of the inputted video signal; readout means for readingout, out of the video signals stored in the field memory, the videosignal delayed from the inputted video signal by a delay found from themotion vector for each area detected by the motion vector detectionmeans from the field memory; switching means for outputting one of theinputted video signal and the video signal read out of the field memoryand the other video signal, respectively, as a left eye video signal anda right eye video signal on the basis of the direction of a horizontalcomponent of the motion vector for each area detected by the motionvector detection means; feature extraction means for extracting for eacharea of the input video a video feature from the inputted video signal;parallax calculation means for calculating, on the basis of the imagefeature for each area of the input video extracted by the featureextraction means, a depth for the area and calculating a parallax forthe area from the calculated depth for the area; parallax correctionmeans for correcting the parallax for each area calculated by theparallax calculation means depending on the magnitude of the motionvector for the area detected by the motion vector detection means; andphase control means for correcting, on the basis of the parallax foreach area corrected by the parallax correction means, phases for thearea of the right eye video and the left eye video outputted by theswitching means, and outputting the videos as a stereoscopic videosignal.
 2. The device for converting a two-dimensional video into athree-dimensional video according to claim 1, characterized in that theparallax correction means comprises means for calculating for each areaa difference parallax obtained by subtracting from the parallax for thearea calculated by the parallax calculation means the parallax dependenton the magnitude of the motion vector in the corresponding area, andmeans for calculating a difference parallax for each area by changing adynamic range such that the maximum value of the difference in thedifference parallax between the adjacent areas is within a predeterminedrange.
 3. The device for converting a two-dimensional video into athree-dimensional video according to claim 2, characterized bycomprising means for reducing, when the sum of the difference parallaxfor each area obtained by the parallax correction means and the parallaxdependent on the magnitude of the motion vector in the correspondingarea exceeds a predetermined range, a delay by an amount correspondingto the excess parallax.
 4. A method of converting a two-dimensionalvideo into a three-dimensional video, characterized by comprising: afirst step of storing for each field a two-dimensional video signalinputted from a video signal source in a field memory; a second step ofdetecting for each area of an input video a motion vector correspondingto movement between fields of the inputted video signal; a third step ofreading out, out of the video signals stored in the field memory, avideo signal delayed from the inputted video signal by a delay foundfrom the motion vector for each area detected at the second step fromthe field memory; a fourth step of outputting one of the inputted videosignal and the video signal read out of the field memory and the othervideo signal, respectively, as a left eye video signal and a right eyevideo signal on the basis of the direction of a horizontal component ofthe motion vector for each area detected at the second step; a fifthstep of extracting for each area of the input video a video feature fromthe inputted video signal; a sixth step of calculating, on the basis ofthe image feature for each area of the input video extracted at thefifth step, a depth for the area and calculating a parallax for the areafrom the calculated depth for the area; a seventh step of correcting theparallax for each area calculated at the sixth step depending on themagnitude of the motion vector for the area detected at the second step;and an eighth step of correcting, on the basis of the parallax for eacharea corrected at the seventh step, phases for the area of the right eyevideo and the left eye video outputted at the fourth step, andoutputting the videos as a stereoscopic video signal.
 5. The method ofconverting a two-dimensional video into a three-dimensional videoaccording to claim 4, characterized in that the seventh steps comprisesthe steps of calculating for each area a difference parallax obtained bysubtracting from the parallax for the area calculated at the sixth stepthe parallax dependent on the magnitude of the motion rector in thecorresponding area, and calculating a difference parallax for each areaby changing a dynamic range such that the maximum value of thedifference in the difference parallax between the adjacent areas iswithin a predetermined range.
 6. The method of converting atwo-dimensional video into a three-dimensional video according to claim5, characterized by comprising the step of reducing, when the sum of thedifference parallax for each area obtained at the seventh step and theparallax dependent on the magnitude of the motion vector in thecorresponding area exceeds a predetermined range, a delay by an amountcorresponding to the excess parallax.
 7. A method of converting atwo-dimensional video into a three-dimensional video, characterized bycomprising: a first step of extracting an image feature related to thelong or short distance of a video from each of a plurality of parallaxcalculation areas set within one screen on the basis of atwo-dimensional video signal, and producing a depth estimate for theparallax calculation area on the basis of the extracted image feature; asecond step of subjecting each of the depth estimates to distance scaleconversion using a dynamic range defined by a predetermined maximumprojection amount and a predetermined maximum recession amount, to finda tentative target phase for each of the parallax calculation areas; athird step of finding the maximum value of a phase difference betweenthe adjacent parallax calculation areas on the basis of the tentativetarget phase for each of the parallax calculation areas; a fourth stepof judging whether or not the maximum value of the phase differencebetween the adjacent parallax calculation areas is within apredetermined distortion allowable range; and a fifth step of searching,when the maximum value of the phase difference between the adjacentparallax calculation areas is outside the predetermined distortionallowable range, for such a dynamic range that the phase differencebetween said parallax calculation areas is within the distortionallowable range, subjecting each of the depth estimates to distancescale conversion using the dynamic range searched for, and finding atentative target phase for each of the parallax calculation areas, toproceed to the third step.
 8. The method of converting a two-dimensionalvideo into a three-dimensional video according to claim 7, characterizedin that at said fifth step, the dynamic range searched for is correctedsuch that the ratio of the maximum projection amount to the maximumrecession amount, which are defined by the dynamic range, is apredetermined ratio, and each of the depth estimates is then subjectedto distance scale conversion using the corrected dynamic range.